43 research outputs found

    Wiretap and Gelfand-Pinsker Channels Analogy and its Applications

    Full text link
    An analogy framework between wiretap channels (WTCs) and state-dependent point-to-point channels with non-causal encoder channel state information (referred to as Gelfand-Pinker channels (GPCs)) is proposed. A good sequence of stealth-wiretap codes is shown to induce a good sequence of codes for a corresponding GPC. Consequently, the framework enables exploiting existing results for GPCs to produce converse proofs for their wiretap analogs. The analogy readily extends to multiuser broadcasting scenarios, encompassing broadcast channels (BCs) with deterministic components, degradation ordering between users, and BCs with cooperative receivers. Given a wiretap BC (WTBC) with two receivers and one eavesdropper, an analogous Gelfand-Pinsker BC (GPBC) is constructed by converting the eavesdropper's observation sequence into a state sequence with an appropriate product distribution (induced by the stealth-wiretap code for the WTBC), and non-causally revealing the states to the encoder. The transition matrix of the state-dependent GPBC is extracted from WTBC's transition law, with the eavesdropper's output playing the role of the channel state. Past capacity results for the semi-deterministic (SD) GPBC and the physically-degraded (PD) GPBC with an informed receiver are leveraged to furnish analogy-based converse proofs for the analogous WTBC setups. This characterizes the secrecy-capacity regions of the SD-WTBC and the PD-WTBC, in which the stronger receiver also observes the eavesdropper's channel output. These derivations exemplify how the wiretap-GP analogy enables translating results on one problem into advances in the study of the other

    Information Storage in the Stochastic Ising Model

    Full text link
    Most information systems store data by modifying the local state of matter, in the hope that atomic (or sub-atomic) local interactions would stabilize the state for a sufficiently long time, thereby allowing later recovery. In this work we initiate the study of information retention in locally-interacting systems. The evolution in time of the interacting particles is modeled via the stochastic Ising model (SIM). The initial spin configuration X0X_0 serves as the user-controlled input. The output configuration XtX_t is produced by running tt steps of the Glauber chain. Our main goal is to evaluate the information capacity In(t)β‰œmax⁑pX0I(X0;Xt)I_n(t)\triangleq\max_{p_{X_0}}I(X_0;X_t) when the time tt scales with the size of the system nn. For the zero-temperature SIM on the two-dimensional nΓ—n\sqrt{n}\times\sqrt{n} grid and free boundary conditions, it is easy to show that In(t)=Θ(n)I_n(t) = \Theta(n) for t=O(n)t=O(n). In addition, we show that on the order of n\sqrt{n} bits can be stored for infinite time in striped configurations. The n\sqrt{n} achievability is optimal when tβ†’βˆžt\to\infty and nn is fixed. One of the main results of this work is an achievability scheme that stores more than n\sqrt{n} bits (in orders of magnitude) for superlinear (in nn) times. The analysis of the scheme decomposes the system into Ξ©(n)\Omega(\sqrt{n}) independent Z-channels whose crossover probability is found via the (recently rigorously established) Lifshitz law of phase boundary movement. We also provide results for the positive but small temperature regime. We show that an initial configuration drawn according to the Gibbs measure cannot retain more than a single bit for tβ‰₯ecn14+Ο΅t\geq e^{cn^{\frac{1}{4}+\epsilon}}. On the other hand, when scaling time with Ξ²\beta, the stripe-based coding scheme (that stores for infinite time at zero temperature) is shown to retain its bits for time that is exponential in Ξ²\beta

    Convergence of Smoothed Empirical Measures with Applications to Entropy Estimation

    Full text link
    This paper studies convergence of empirical measures smoothed by a Gaussian kernel. Specifically, consider approximating Pβˆ—NΟƒP\ast\mathcal{N}_\sigma, for NΟƒβ‰œN(0,Οƒ2Id)\mathcal{N}_\sigma\triangleq\mathcal{N}(0,\sigma^2 \mathrm{I}_d), by P^nβˆ—NΟƒ\hat{P}_n\ast\mathcal{N}_\sigma, where P^n\hat{P}_n is the empirical measure, under different statistical distances. The convergence is examined in terms of the Wasserstein distance, total variation (TV), Kullback-Leibler (KL) divergence, and Ο‡2\chi^2-divergence. We show that the approximation error under the TV distance and 1-Wasserstein distance (W1\mathsf{W}_1) converges at rate eO(d)nβˆ’12e^{O(d)}n^{-\frac{1}{2}} in remarkable contrast to a typical nβˆ’1dn^{-\frac{1}{d}} rate for unsmoothed W1\mathsf{W}_1 (and dβ‰₯3d\ge 3). For the KL divergence, squared 2-Wasserstein distance (W22\mathsf{W}_2^2), and Ο‡2\chi^2-divergence, the convergence rate is eO(d)nβˆ’1e^{O(d)}n^{-1}, but only if PP achieves finite input-output Ο‡2\chi^2 mutual information across the additive white Gaussian noise channel. If the latter condition is not met, the rate changes to Ο‰(nβˆ’1)\omega(n^{-1}) for the KL divergence and W22\mathsf{W}_2^2, while the Ο‡2\chi^2-divergence becomes infinite - a curious dichotomy. As a main application we consider estimating the differential entropy h(Pβˆ—NΟƒ)h(P\ast\mathcal{N}_\sigma) in the high-dimensional regime. The distribution PP is unknown but nn i.i.d samples from it are available. We first show that any good estimator of h(Pβˆ—NΟƒ)h(P\ast\mathcal{N}_\sigma) must have sample complexity that is exponential in dd. Using the empirical approximation results we then show that the absolute-error risk of the plug-in estimator converges at the parametric rate eO(d)nβˆ’12e^{O(d)}n^{-\frac{1}{2}}, thus establishing the minimax rate-optimality of the plug-in. Numerical results that demonstrate a significant empirical superiority of the plug-in approach to general-purpose differential entropy estimators are provided.Comment: arXiv admin note: substantial text overlap with arXiv:1810.1158

    Capacity of Continuous Channels with Memory via Directed Information Neural Estimator

    Full text link
    Calculating the capacity (with or without feedback) of channels with memory and continuous alphabets is a challenging task. It requires optimizing the directed information (DI) rate over all channel input distributions. The objective is a multi-letter expression, whose analytic solution is only known for a few specific cases. When no analytic solution is present or the channel model is unknown, there is no unified framework for calculating or even approximating capacity. This work proposes a novel capacity estimation algorithm that treats the channel as a `black-box', both when feedback is or is not present. The algorithm has two main ingredients: (i) a neural distribution transformer (NDT) model that shapes a noise variable into the channel input distribution, which we are able to sample, and (ii) the DI neural estimator (DINE) that estimates the communication rate of the current NDT model. These models are trained by an alternating maximization procedure to both estimate the channel capacity and obtain an NDT for the optimal input distribution. The method is demonstrated on the moving average additive Gaussian noise channel, where it is shown that both the capacity and feedback capacity are estimated without knowledge of the channel transition kernel. The proposed estimation framework opens the door to a myriad of capacity approximation results for continuous alphabet channels that were inaccessible until now

    Max-Sliced Mutual Information

    Full text link
    Quantifying the dependence between high-dimensional random variables is central to statistical learning and inference. Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure that also captures high-order dependencies. However, CCA only accounts for linear dependence, which may be insufficient for certain applications, while mutual information is often infeasible to compute/estimate in high dimensions. This work proposes a middle ground in the form of a scalable information-theoretic generalization of CCA, termed max-sliced mutual information (mSMI). mSMI equals the maximal mutual information between low-dimensional projections of the high-dimensional variables, which reduces back to CCA in the Gaussian case. It enjoys the best of both worlds: capturing intricate dependencies in the data while being amenable to fast computation and scalable estimation from samples. We show that mSMI retains favorable structural properties of Shannon's mutual information, like variational forms and identification of independence. We then study statistical estimation of mSMI, propose an efficiently computable neural estimator, and couple it with formal non-asymptotic error bounds. We present experiments that demonstrate the utility of mSMI for several tasks, encompassing independence testing, multi-view representation learning, algorithmic fairness, and generative modeling. We observe that mSMI consistently outperforms competing methods with little-to-no computational overhead.Comment: Accepted at NeurIPS 202
    corecore